feat: per-session pipelining via seq-ordered data ops by dazzling-no-more · Pull Request #903 · therealaleph/MasterHttpRelayVPN-RUST

dazzling-no-more · 2026-05-08T09:49:33Z

Summary

Per-session pipelining for full-tunnel mode. Allows up to 2 in-flight data ops per session to overlap one Apps Script round-trip with the next on a single TCP stream, lifting the per-session throughput cap that came from strict request/await/request sequencing.

Wire protocol (forward-compatible — old peers ignore the new fields):

BatchOp.seq: Option<u64> — per-session monotonic sequence number, sent only on data ops in pipelined sessions.
TunnelResponse.seq: Option<u64> — echoed by the server.
TunnelResponse.caps: Option<u32> — advertised on connect / connect_data success replies. Bit 0 = CAPS_PIPELINE_SEQ.
u64 rather than u32 so a long-lived TCP session generating ~100 ops/s doesn't saturate after ~1.4 years.

Tunnel-node (new): when a data op carries seq, the server processes the entire (write, drain) sequence under a per-session seq lock and only releases it after expected advances. Two ops can travel through different deployments and arrive out of order — the seq lock guarantees they're processed (and replied to) in send order, so a client reorder buffer never sees bytes from seq=N+1 land before seq=N's bytes. Old data ops without seq continue down the legacy unordered path. connect/connect_data success replies set caps: Some(CAPS_PIPELINE_SEQ) — old clients ignore the unknown field, new clients opt into pipelining.

Client (new): tunnel_loop_pipelined keeps up to PIPELINE_DEPTH = 2 data ops in flight per pipelined session. Reordering is implicit — VecDeque<(seq, oneshot::Receiver)> awaited in FIFO order means replies for later seqs sit in their oneshot buffers until our task gets to them. Idle sessions stay at depth=1 (server long-polls; no speculative quota burn). Pipelining only enables when the connect reply advertised caps AND no prior reply has been observed dropping the seq field (mixed-version backend protection).

Cross-cutting correctness:

Per-batch BatchWait primitive replaces N×M per-job watchers with one watcher per unique SessionInner (deduplicated by Arc::as_ptr). reader_task switched to notify_waiters() so concurrent batches all wake on each push.
pipelined_reply_timeout derived as effective_batch * 2 + slack — covers permit-saturated semaphore wait plus the op's own request budget.
Active seq ops (had_uplink == true) wait per-session up to ACTIVE_DRAIN_DEADLINE and run a straggler-settle loop, so multi-packet upstream responses land in one drain instead of being split across batches. Idle ops still subscribe to the shared BatchWait for cross-session wake.
close ops collected during dispatch and processed AFTER seq jobs ONLY when an earlier same-sid op is already deferred (else run inline) — preserves both [data(seq), close] (data writes before close tears the session down) AND [close, data] (close runs first, data hits the closed session).
client_send_closed flag in tunnel_loop_pipelined so a TCP half-close from the local client stops queuing new ops but keeps draining already-queued reply oneshots — valid HTTP request/response with client-side shutdown no longer drops downlink bytes.
Effective batch timeout threaded through tunnel_batch_request_with_timeout (3-arg tunnel_batch_request_to kept as backward-compat wrapper) so both h2 (h2_relay_request) and h1 (read_http_response_with_timeout) honor the pipelined-batch floor — avoids inner 30 s default firing before our outer 60 s budget on user-tuned configs.
Phase 2's TCP-drain budget reservation capped at TCP_DRAIN_MAX_BYTES per iteration (mirrors the seq path), so concurrent seq jobs continue to see headroom while Phase 2's drain is in flight.

Public API change

TunnelMux::udp_open and udp_data already take impl Into<Bytes> (carried over from the prior zero-copy PR). tunnel_batch_request_to keeps its 3-arg signature; new tunnel_batch_request_with_timeout exposes the explicit timeout for callers that need it. Public structs TunnelResponse and BatchOp gain new optional fields with #[serde(default)] / #[serde(skip_serializing_if = "Option::is_none")] — old wire shapes deserialize cleanly and serialize identically.

Realistic perf delta

For active streaming downloads on a single TCP session, expect 30–80% throughput improvement depending on what fraction of the Apps Script RTT is server-side vs network. Less for idle / long-poll workloads (server-side seq enforcement serializes drains per session). No win on multi-session workloads (those were already parallel across sessions). Documented latency trade-off: a stuck seq op (lost earlier seq from a dropped batch) holds the batch HTTP response open for up to SEQ_WAIT_TIMEOUT = 30 s; the pinned regression test unrelated_seq_session_in_same_batch_is_not_delayed_past_seq_wait ensures unrelated sessions' intrinsic processing time stays sub-second within that window.

Test plan

cargo build --bins --lib clean both crates
cargo test --lib (client) + cargo test (tunnel-node) pass — 216 + 57 = 273 total
Server-side correctness: seq ordering across out-of-order arrivals, missed-notify race in wait_for_seq_turn, write-failure path bumps expected (SeqAdvanceOnDrop), tail preservation under cap, batch budget shared with seq jobs, EOF withholding while buffer non-empty, [data(seq), close] runs data before close, [close, data] runs close first
Server-side concurrency: BatchWait wakes all jobs on single push, dedupes watchers per unique inner, wakes across concurrent batches on the same session, mixed ready+idle batch latency, Phase 2 budget reservation capped at TCP_DRAIN_MAX_BYTES
Client-side: pipelined loop emits seq=0 first, in-seq-order output when replies arrive reversed, closes on seq mismatch, closes on seq=None (mixed-version detection), drains in-flight on client half-close, pipelined_reply_timeout covers 2× batch_timeout, mark_pipelining_disabled sticky and overrides caps
Manual: load a heavy site through full-tunnel mode and confirm no regressions on legacy non-pipelined sessions (mixed connect_data capability detection)
Manual: pipelined-capable session under heavy concurrent load — confirm no avoidable disconnects under semaphore saturation

therealaleph · 2026-05-08T13:03:03Z

Reviewed via Anthropic Claude.

Verified locally on top of v1.9.18:

cargo build --bins --lib --release: clean
cargo test --lib --release: 216/216 ✅
(cd tunnel-node && cargo test --release): 57/57 ✅

Wire-protocol design looks careful — #[serde(default)] + #[serde(skip_serializing_if = "Option::is_none")] on the new optional fields means old peers ignore the new fields and old shapes deserialize cleanly. The opt-in cap-bit (CAPS_PIPELINE_SEQ on connect-success) + sticky mark_pipelining_disabled on first seq=None reply is the right shape for mixed-deployment fleets where one Apps Script is running old Code.gs.

Hesitating on immediate merge because of two factors:

Surface area: 3620 additions across the three highest-traffic files in the project (tunnel_client, domain_fronter, tunnel-node). Even with the test count this is the kind of PR where edge-case races only show up under real Iran-ISP latency + h2 connection drops + concurrent sessions on a busy VPS.
Manual checkboxes still unchecked in the test plan: heavy load against a non-pipelined backend (mixed-version), and pipelined session under semaphore-saturation.

Plan: leaving open for 5–7 days of community testing like we did with #359 (drive queue). If two or three users on different ISPs report it stable for streaming downloads + heavy concurrent browsing, I'll squash-merge as v1.9.19.

For testers reading this: pull feature/tunnel-session-pipelining, run cargo build --release --features ui --bin mhrv-rs-ui, switch to Full mode, do a heavy download (Steam / large CDN file) + a few hours of normal browsing, and report back here with:

ISP / country
mhrv-rs version your tunnel-node deployment is running (so we know if you're testing pure-pipelined or mixed-version)
any disconnects, stalls, or out-of-order data the browser noticed (e.g. partial JS bundles, mid-stream HTTPS errors)

Thanks @dazzling-no-more — this is the kind of structural perf work that's hard to land safely. The seq-lock + capability-negotiation design is exactly the right architecture for what is ultimately a not-quite-ordered transport layer.

feat: per-session pipelining via seq-ordered data ops

ed7aa79

github-actions Bot added the type: feature feat: PR — auto-applied by release-drafter label May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: per-session pipelining via seq-ordered data ops#903

feat: per-session pipelining via seq-ordered data ops#903
dazzling-no-more wants to merge 1 commit intotherealaleph:mainfrom
dazzling-no-more:feature/tunnel-session-pipelining

dazzling-no-more commented May 8, 2026

Uh oh!

therealaleph commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dazzling-no-more commented May 8, 2026

Summary

Public API change

Realistic perf delta

Test plan

Uh oh!

therealaleph commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants